Interlingual map task corpus collection
نویسندگان
چکیده
We present a prototype interlingual communication system that is being used to collect a corpus of task based dialogues between speakers of different languages. This corpus will be used to assess human reactions to an automated speech-to-speech translation system. In this demonstration we show how the HCRC Map Task can be adapted to support data collection in this interlingual environment, and how we used easily accessible speech and language technology for the rapid prototyping of the system used for data collection. An explanation of the nature and purpose of the data we are collecting is also presented.
منابع مشابه
The ILMT-s2s Corpus ― A Multimodal Interlingual Map Task Corpus
This paper presents the multimodal Interlingual Map Task Corpus (ILMT-s2s corpus) collected at Trinity College Dublin, and discuss some of the issues related to the collection and analysis of the data. The corpus design is inspired by the HCRC Map Task Corpus which was initially designed to support the investigation of linguistic phenomena, and has been the focus of a variety of studies of comm...
متن کاملStrategies Used in the Translation of Interlingual Subtitling
This study was an attempt to identify the interlingual strategies employed to translate English subtitles into Persian and to determine their frequency, as well. Contrary to many countries, subtitling is a new field in Iran. The study, a corpus-based, comparative, descriptive, non-judgmental analysis of an English-Persian parallel corpus, comprised English audio scripts of five movies of differ...
متن کاملInterlingual Annotation for MT Development
MT systems that use only superficial representations, including the current generation of statistical MT systems, have been successful and useful. However, they will experience a plateau in quality, much like other “silver bullet” approaches to MT. We pursue work on the development of interlingual representations for use in symbolic or hybrid MT systems. In this paper, we describe the creation ...
متن کاملExploiting the Leipzig Corpora Collection
In this paper the Leipzig Corpora Collection is introduced as a contribution to the idea that there is need for standardization of multilingual language resources. We explain the steps of building, processing and presenting corpora of comparable sizes and in a uniform format. Results from intraand interlingual comparisons of corpora are given and methods that can build upon these corpora
متن کاملUsing Wikipedia for Named-Entity Translation
In this paper we present a system for translating named-entities from Basque to English using Wikipedia’s knowledge. We can exploit interlingual links from Wikipedia (WIL) to get named-entity translation, but entities without interlingual links can be translated using the Wikipedia as a corpus, suggesting new interlingual links. In this second case the interlingual links can be used as a test c...
متن کامل